Serveur d'exploration sur la TEI

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Teaching & Learning Guide for: Corpus Linguistics in the UK: Resources for Sociolinguistic Research

Identifieur interne : 000097 ( Main/Exploration ); précédent : 000096; suivant : 000098

Teaching & Learning Guide for: Corpus Linguistics in the UK: Resources for Sociolinguistic Research

Auteurs : Wendy Anderson [Royaume-Uni]

Source :

RBID : ISTEX:1570F6442817FA2766D019A0CACE13E47E28832A

Abstract

Author's Introduction: Linguistics has drawn on the large quantities of authentic data contained in language corpora for several decades now. While debates continue regarding the nature and interpretation of such data, it is generally accepted that corpus methodologies offer a valuable perspective on language, one that complements the introspective and elicited data used in different sub‐fields of linguistics. Increasingly, language corpora can be searched or downloaded over the Internet, and are now therefore very readily accessible. Many also include demographic or textual metadata that make them invaluable as data for sociolinguistics. While existing corpora may have some drawbacks (e.g. where the corpus design is not ideally suited to the study in hand, or available corpora do not have appropriate mark‐up), they offer great savings in time and effort compared to creating a new corpus. Moreover, especially given the increasing availability of spoken texts in corpora, they constitute excellent resources for students of different levels, for teachers looking for a quick way to demonstrate a feature of language, and for researchers testing linguistic hypotheses. Author Recommends: 1. Wynne, Martin. (ed.) 2005. Developing linguistic corpora: a guide to good practice. Oxford: Oxbow Books. Available online from http://ahds.ac.uk/linguistic‐corpora/. This AHDS Guide to Good Practice gives an up‐to‐date overview of many of the issues involved in creating corpora, and is essential reading for corpus users as well as for corpus creators, whether on a large or small scale. The six chapters and supplementary material are all written by experts in the topics covered, which range from metadata, spoken language corpora and annotation, to the preservation and distribution of corpora. 2. Adolphs, Svenja. 2006. Introducing electronic text analysis. London and New York: Routledge. This introduction takes a very practical approach to the investigation of both literary and non‐literary texts using computers, and I recommend it highly for beginners, such as undergraduates in linguistics or humanities computing. Routledge's companion website contains links to online corpora and analysis software that encourage readers to carry out their own studies inspired by the many examples in the book. 3. McEnery, Tony, Richard Xiao, and Yukio Tono. 2006. Corpus‐based language studies: an advanced resource book. London and New York: Routledge. This is a really excellent book, which gives a very broad overview of what corpus linguistics is, how corpora can be used, and the research that has been done on corpora in the past. In common with the other books in the Routledge Applied Linguistics series, it is structured around ‘Introduction’, ‘Extension’ and ‘Exploration’ sections of 6–10 units each, which combine detailed discussion, extracts from key readings in the field, and tasks for students. 4. Tagliamonte, Sali A. 2006. Analysing sociolinguistic variation. Cambridge: Cambridge University Press. Tagliamonte has created and worked with a number of corpora of English, and treats corpora here as one source of data that can be combined with various others (such as sociolinguistic interviews and elicited data) to carry out in‐depth sociolinguistic analysis. This book makes an excellent introduction to sociolinguistic methods for advanced undergraduates, and postgraduates. 5. O’Keeffe, Anne, Michael McCarthy, and Ronald Carter. 2007. From corpus to classroom: language use and language teaching. Cambridge: Cambridge University Press. This textbook draws primarily on the Cambridge and Nottingham Corpus of Discourse in English (CANCODE) and the Cambridge International Corpus. It demonstrates, through enthusiastic discussion and many examples, how corpus data can inform language teaching. 6. Sampson, Geoffrey, and Diana McCarthy. (eds). 2004. Corpus linguistics: readings in a widening discipline. London and New York: Continuum. This is a collection of 42 key research articles from half a century of corpus linguistics, and touches on the field from almost every possible angle. Sociolinguists will certainly find several threads to interest them, but the real strength of this book is in the convenience of having articles by so many of the most influential corpus researchers and theorists together in one volume. I recommend it highly to anyone intending to become seriously involved with corpora. 7. Biber, Douglas, Susan Conrad, and Randi Reppen. 1998. Corpus linguistics: investigating language structure and use. Cambridge: Cambridge University Press. This a very useful introductory book, with a strong focus on investigating varieties and variation. A series of methodology boxes at the end of the book sets out important concepts such as concordancing, tagging, and statistical measures used in corpus linguistics. 8. Beal, Joan C., Karen P. Corrigan, and Hermann L. Moisl. (eds). 2007. Creating and digitizing language corpora. Volume 1: synchronic databases, Volume 2: diachronic databases. Basingstoke: Palgrave Macmillan. These two volumes bring together papers delivered at a workshop held in Newcastle in 2004, along with additional invited contributions. Both the synchronic volume and the diachronic volume contain descriptions of corpus work relevant to sociolinguists, and together give a detailed overview of the diverse work underway on what the editors call ‘unconventional’ language data, which encompass dialect material and child language among other types. The focus is largely on Europe and the USA, but extends far beyond corpora of English. 9. Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. A classic text in corpus linguistics, which gives the reader a good understanding of not just how to do corpus linguistics but also why. Technology has of course moved on since this book was written, and corpora look quite different now, but the basic principles have changed little. Also recommended are John Sinclair's other works, including Trust the Text: Language, Corpus and Discourse (Routledge, 2004) and, for developing interpretative skills, Reading Concordances (Longman, 2003). 10. McEnery, Tony. 2005. Swearing in English: Bad Language, Purity and Power from 1586 to the Present. London and New York: Routledge. Naturally the subject matter here is relatively narrow, but this book is a classic demonstration of how corpus methodology can contribute to an in‐depth study of a language phenomenon. Swearing and bad language are closely correlated with social context. The principal data drawn on here by McEnery is the spoken component of the British National Corpus. 11. Anderson, Wendy, and John Corbett. To appear 2009. Exploring English with Online Corpora. Basingstoke: Palgrave Macmillan. This book is a basic introduction to the use of online corpora, for students and teachers with little or no previous knowledge. It surveys available online corpora of English, and each chapter contains a series of interactive tasks focusing on levels of language from pronunciation to discourse. Online Materials: 1. Linguist List web resources for texts and corpora http://www.linguistlist.org/sp/Texts.html Linguist List is a widely‐used portal for finding information and resources in all areas of linguistics. The site also runs a worldwide mailing list that is a first port of call for finding out about new publications, current research, jobs, and topics currently being debated. This link is to Linguist List's catalogue of text and corpus resources, including software, which is regularly maintained. 2. British National Corpus (BNC) http://www.natcorp.ox.ac.uk/ This is the homepage of the BNC, and contains detailed information about the corpus, its availability, and features a simple search facility that allows you to retrieve up to 50 random hits of a search term in the entire corpus or in user‐specified sub‐corpora. More complex queries, integrating part of speech information, are also possible. 3. Mark Davies’ interface to the British National Corpus http://corpus.byu.edu/bnc/ This interface, run by Professor Mark Davies at Brigham Young University, Utah, provides a very attractive and flexible way of analysing the BNC, including comparing registers, searching by part of speech, and analysing collocates. Also available from http://corpus.byu.edu are the Corpus of Contemporary American English and the TIME corpus, which use the same interface as the BNC, as well as several corpora of other languages. 4. International Corpus of English (ICE) http://ucl.ac.uk/english‐usage/ice/ The International Corpus of English (ICE) is made up of a set of 1 million‐word corpora of national or regional varieties of English that follow a common design and are therefore readily comparable. Some of the component corpora are available for download from this site; others may be obtained on CD‐Rom; a further number are in the process of creation. Some sample sound files are also available here. 5. The Newcastle Electronic Corpus of Tyneside English (NECTE) http://www.ncl.ac.uk/necte/ The Newcastle Electronic Corpus of Tyneside English is a TEI‐conformant corpus of speech spanning 30 years from the North East of England. This webpage describes the corpus and gives details of its availability. It cannot be searched online, but can be obtained free of charge by researchers. 6. Scottish Corpus of Texts & Speech (SCOTS) http://www.scottishcorpus.ac.uk SCOTS is a corpus of texts in Scottish English and varieties of Scots. Twenty percent of the 4 million‐word corpus is made up of spoken language, and is presented as audio or audio‐video files with searchable orthographic transcriptions. Full corpus texts are available and may be analysed with integrated search tools or downloaded. Detailed textual and demographic metadata accompany each text. 7. The IViE Corpus, English Intonation in the British Isles http://www.phon.ox.ac.uk/IViE/ IViE is an excellent resource for phonetic and phonological analysis, and was created for the Intonational Variation in English project at the University of Oxford. The data represent different levels of spontaneity – read speech data, semi‐spontaneous speech data and interactive speech (map task) – with informants from nine urban areas in the British Isles, including Dublin, Belfast, London and Newcastle. 8. Freiburg English Dialect (FRED) Corpus http://www2.anglistik.uni‐freiburg.de/institut/lskortmann/FRED/index.htm The FRED homepage contains detailed documentation on the corpus and its availability, as well as several sample texts and audio files that can be downloaded. The complete corpus samples nine dialect areas in the UK, and totals 300 hours of speech. 9. Text Encoding Initiative (TEI) http://www.tei‐c.org The TEI consortium has developed a widely‐used standard for representing texts in digital form. The TEI website includes detailed mark‐up guidelines, as well as resources for learning how to implement them. The site also maintains a list of links to projects that have used TEI mark‐up: many of these projects are good sources of texts for building corpora to use in linguistic research and teaching. 10. Text Analysis Portal for Research (TAPoR) http://taporware.mcmaster.ca/ The TAPoR text analysis tools, developed by Geoffrey Rockwell, are a suite of programmes that can be used over the web on texts or small corpora specified by the user, in plain text, XML or HTML format. The programmes are varied, and include tools for creating word lists, concordancing, identifying collocations, examining the distribution of words through texts, among many others. Sample Unit: Corpora and Variation: introduction: This practical course provides an introduction to corpus methodology for students familiar with the concepts and methods of studying variation, perhaps as a prelude to creating their own small corpus as data for a larger sociolinguistic study or dissertation. Most sessions are best suited to a computer classroom, in which students have access to a number of appropriate language corpora. The course can be easily modified to use corpora available to the group, for example, using only free online resources, or networked corpora available through an institutional licence. Students carry out each week's practical work in pairs or small groups: each pair or group then develops one study into a short presentation to be delivered in the final weeks of the course. syllabus: Week 1: Introduction to corpora This session concentrates on the nature of data in studying linguistic variation, considering the advantages and disadvantages of corpora over introspection, elicited data and sociolinguistic interviews. Suggested reading: Hunston, Susan. 2006. Corpus linguistics. In Brown, Keith et al. (eds). Encyclopedia of language and linguistics. Second edition. Volume 3. Amsterdam: Elsevier. 234–248. McEnery, Tony and Andrew Wilson. 2001. Corpus linguistics. 2nd edition. Edinburgh: Edinburgh University Press. Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. Week 2: Corpora and linguistic variation This week's discussion will consider the suitability of corpora for the study of linguistic variation, with a particular focus on demographic metadata and the notion of corpus representativeness. Suggested reading: Anderson, Wendy. 2008. Corpus linguistics in the UK: resources for sociolinguistic research. Language and Linguistics Compass 2/2: 352–371. Bauer, Laurie. 2002. Inferring Variation and Change from Public Corpora. In J. K. Chambers, Peter Trudgill and Natalie Schilling‐Estes (eds.). The handbook of language variation and change. Oxford: Blackwell. Chambers: 97–114. Sigley, Robert. 2006. Corpora in studies of variation. In Keith Brown et al. (eds). Encyclopedia of language and linguistics. Second edition. Volume 3. Amsterdam: Elsevier. 220–226. Week 3: A general corpus of UK English – British National Corpus Individually or in pairs, students familiarize themselves with the British National Corpus and its search interface at http://corpus.byu.edu/bnc/. Example questions: •  Does the collocational patterning of the word window vary according to textual register? •  What words follow red in journalism? And in tabloid journalism specifically? Suggested reading: Anderwald, Lieselotte. 2001. Was/were‐variation in non‐standard British English today. English World‐wide 22(1): 1–21. British National Corpus, description of corpus at http://www.natcorp.ox.ac.uk/ Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. Week 4: A general corpus of US English – Corpus of Contemporary American English This session looks at language use as represented in the Corpus of Contemporary American English (http://www.americancorpus.org/), in comparison with the British National Corpus. Example questions: •  Investigate some of the well‐known lexical differences between US and UK English. •  Compare UK and US usage of singular and plural verbs after collective nouns (e.g. The staff is/are; the team is/are) Suggested reading: Corpus of Contemporary American English, description of corpus, introduction and brief tour at http://www.americancorpus.org/ Kirkpatrick, Andy. 2007. World Englishes: Implications for International Communication and English Language Teaching. Cambridge: Cambridge University Press. Week 5: Studying pronunciation with the IViE corpus Using the IViE corpus (http://www.phon.ox.ac.uk/IViE/), investigate in what positions /r/ is pronounced (rhoticity) across a range of urban varieties of English in the British Isles. Suggested reading: Ladefoged, Peter & Ian Maddieson. 1996. The sounds of the world's languages. Oxford: Blackwell. Week 6: Language and age – the SCOTS corpus Using the advanced search facility of the SCOTS corpus to search the spoken component of the corpus, investigate ways in which language is influenced by the age of the speaker (http://www.scottishcorpus.ac.uk). Example questions: •  What intensifiers (e.g. very, utterly, absolutely and definitely) are typical of the speech of older and younger speakers respectively? •  Compare the use of the word‐form like by speakers of different age groups. Suggested reading: Anderson, Wendy. 2006. ‘Absolutely, totally, filled to the brim with the Famous Grouse’: Intensifying adverbs in SCOTS. English Today 22(3): 10–16. Ito, Rika and Sali Tagliamonte. 2003. Well weird, right dodgy, very strange, really cool: Layering and recycling in English intensifiers. Language in Society 32: 257–279. Levey, Stephen. 2003. He's like ‘Do it now!’ and I'm like ‘No!’, some innovative quotative usage among young people in London. English Today 19(1): 24–32. Macaulay, Ronald. 2002. Extremely interesting, very interesting, or only quite interesting? Adverbs and social class. Journal of Sociolinguistics 6(3): 398–417. Week 7: Variation within a domain – the MICASE corpus Use the full texts available in the Michigan Corpus of Academic Spoken English (MICASE, at http://quod.lib.umich.edu/m/micase/) to investigate how students and tutors asks questions of each other in different sub‐genres of academic discourse, such as lectures, seminars and office‐hour meetings. Suggested viewing: MICASE demos on You Tube: http://www.youtube.com/watch?v=dQEsX8p0wtY Week 8: Variation over time – the TIME corpus Using the TIME corpus, available from http://corpus.byu.edu/time, investigate some words that modify their meaning over time. You will find various suggestions of interesting words in the information accompanying the corpus, created by Mark Davies. Suggested reading: Leech, Geoffrey and Nicholas Smith. 2005. Extending the possibilities of corpus‐based research on English in the twentieth century: a prequel to LOB and FLOB. ICAME Journal 29: 83–98. Week 9: Group presentations of chosen sociolinguistic investigation – I In groups, students should choose one of the corpora used in Weeks 3–8, carry out a small study which uses this corpus as data, and present their findings to the rest of the class. Week 10: Group presentations of chosen sociolinguistic investigation – II As Week 9 above. Focus Questions: 1 What practical and theoretical challenges do corpus compilers face, and why? 2 How important is metadata to a corpus? 3 What sorts of linguistic information can be obtained from large text corpora that cannot easily be obtained from sociolinguistic interviews? 4 What issues arise in using corpora to investigate the relationship between language and society? 5 For what sorts of research is a grammatically tagged corpus useful, and why is there no single system for marking up corpora? 6 For either a written text or passage of spoken language, what are the contextual factors that have an influence on the language used? 7 What genres of text would you expect to find in a general corpus of a national standard variety of English? Seminar/Project Idea: Group Project: Language and society in corpora Identify a feature of language that seems to you to be particular to a variety of English with which you are familiar, or which you believe is typical of a type of speaker (e.g. age group, gender or profession). Depending on the corpora available, you might think about features of lexis, phraseology, grammar or pronunciation. Investigate the feature you have identified using an available corpus. Make sure to consider how appropriate your chosen corpus is for the study, and therefore how generalizable the findings of your analysis are. Is the feature used throughout the variety, or does its use appear to be affected by other factors, such as the level of formality, textual register, or speaker‐related variables?

Url:
DOI: 10.1111/j.1749-818X.2008.00106.x


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Teaching & Learning Guide for: Corpus Linguistics in the UK: Resources for Sociolinguistic Research</title>
<author>
<name sortKey="Anderson, Wendy" sort="Anderson, Wendy" uniqKey="Anderson W" first="Wendy" last="Anderson">Wendy Anderson</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:1570F6442817FA2766D019A0CACE13E47E28832A</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1111/j.1749-818X.2008.00106.x</idno>
<idno type="url">https://api.istex.fr/document/1570F6442817FA2766D019A0CACE13E47E28832A/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000256</idno>
<idno type="wicri:Area/Istex/Curation">000256</idno>
<idno type="wicri:Area/Istex/Checkpoint">000066</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000066</idno>
<idno type="wicri:doubleKey">1749-818X:2009:Anderson W:teaching:amp:learning</idno>
<idno type="wicri:Area/Main/Merge">000097</idno>
<idno type="wicri:Area/Main/Curation">000097</idno>
<idno type="wicri:Area/Main/Exploration">000097</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Teaching & Learning Guide for: Corpus Linguistics in the UK: Resources for Sociolinguistic Research</title>
<author>
<name sortKey="Anderson, Wendy" sort="Anderson, Wendy" uniqKey="Anderson W" first="Wendy" last="Anderson">Wendy Anderson</name>
<affiliation wicri:level="4">
<country>Royaume-Uni</country>
<placeName>
<settlement type="city">Glasgow</settlement>
<region type="country">Écosse</region>
</placeName>
<orgName type="university">Université de Glasgow</orgName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Language and Linguistics Compass</title>
<idno type="ISSN">1749-818X</idno>
<idno type="eISSN">1749-818X</idno>
<imprint>
<publisher>Blackwell Publishing Ltd</publisher>
<pubPlace>Oxford, UK</pubPlace>
<date type="published" when="2009-01">2009-01</date>
<biblScope unit="volume">3</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="509">509</biblScope>
<biblScope unit="page" to="516">516</biblScope>
</imprint>
<idno type="ISSN">1749-818X</idno>
</series>
<idno type="istex">1570F6442817FA2766D019A0CACE13E47E28832A</idno>
<idno type="DOI">10.1111/j.1749-818X.2008.00106.x</idno>
<idno type="ArticleID">LNC3106</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">1749-818X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Author's Introduction: Linguistics has drawn on the large quantities of authentic data contained in language corpora for several decades now. While debates continue regarding the nature and interpretation of such data, it is generally accepted that corpus methodologies offer a valuable perspective on language, one that complements the introspective and elicited data used in different sub‐fields of linguistics. Increasingly, language corpora can be searched or downloaded over the Internet, and are now therefore very readily accessible. Many also include demographic or textual metadata that make them invaluable as data for sociolinguistics. While existing corpora may have some drawbacks (e.g. where the corpus design is not ideally suited to the study in hand, or available corpora do not have appropriate mark‐up), they offer great savings in time and effort compared to creating a new corpus. Moreover, especially given the increasing availability of spoken texts in corpora, they constitute excellent resources for students of different levels, for teachers looking for a quick way to demonstrate a feature of language, and for researchers testing linguistic hypotheses. Author Recommends: 1. Wynne, Martin. (ed.) 2005. Developing linguistic corpora: a guide to good practice. Oxford: Oxbow Books. Available online from http://ahds.ac.uk/linguistic‐corpora/. This AHDS Guide to Good Practice gives an up‐to‐date overview of many of the issues involved in creating corpora, and is essential reading for corpus users as well as for corpus creators, whether on a large or small scale. The six chapters and supplementary material are all written by experts in the topics covered, which range from metadata, spoken language corpora and annotation, to the preservation and distribution of corpora. 2. Adolphs, Svenja. 2006. Introducing electronic text analysis. London and New York: Routledge. This introduction takes a very practical approach to the investigation of both literary and non‐literary texts using computers, and I recommend it highly for beginners, such as undergraduates in linguistics or humanities computing. Routledge's companion website contains links to online corpora and analysis software that encourage readers to carry out their own studies inspired by the many examples in the book. 3. McEnery, Tony, Richard Xiao, and Yukio Tono. 2006. Corpus‐based language studies: an advanced resource book. London and New York: Routledge. This is a really excellent book, which gives a very broad overview of what corpus linguistics is, how corpora can be used, and the research that has been done on corpora in the past. In common with the other books in the Routledge Applied Linguistics series, it is structured around ‘Introduction’, ‘Extension’ and ‘Exploration’ sections of 6–10 units each, which combine detailed discussion, extracts from key readings in the field, and tasks for students. 4. Tagliamonte, Sali A. 2006. Analysing sociolinguistic variation. Cambridge: Cambridge University Press. Tagliamonte has created and worked with a number of corpora of English, and treats corpora here as one source of data that can be combined with various others (such as sociolinguistic interviews and elicited data) to carry out in‐depth sociolinguistic analysis. This book makes an excellent introduction to sociolinguistic methods for advanced undergraduates, and postgraduates. 5. O’Keeffe, Anne, Michael McCarthy, and Ronald Carter. 2007. From corpus to classroom: language use and language teaching. Cambridge: Cambridge University Press. This textbook draws primarily on the Cambridge and Nottingham Corpus of Discourse in English (CANCODE) and the Cambridge International Corpus. It demonstrates, through enthusiastic discussion and many examples, how corpus data can inform language teaching. 6. Sampson, Geoffrey, and Diana McCarthy. (eds). 2004. Corpus linguistics: readings in a widening discipline. London and New York: Continuum. This is a collection of 42 key research articles from half a century of corpus linguistics, and touches on the field from almost every possible angle. Sociolinguists will certainly find several threads to interest them, but the real strength of this book is in the convenience of having articles by so many of the most influential corpus researchers and theorists together in one volume. I recommend it highly to anyone intending to become seriously involved with corpora. 7. Biber, Douglas, Susan Conrad, and Randi Reppen. 1998. Corpus linguistics: investigating language structure and use. Cambridge: Cambridge University Press. This a very useful introductory book, with a strong focus on investigating varieties and variation. A series of methodology boxes at the end of the book sets out important concepts such as concordancing, tagging, and statistical measures used in corpus linguistics. 8. Beal, Joan C., Karen P. Corrigan, and Hermann L. Moisl. (eds). 2007. Creating and digitizing language corpora. Volume 1: synchronic databases, Volume 2: diachronic databases. Basingstoke: Palgrave Macmillan. These two volumes bring together papers delivered at a workshop held in Newcastle in 2004, along with additional invited contributions. Both the synchronic volume and the diachronic volume contain descriptions of corpus work relevant to sociolinguists, and together give a detailed overview of the diverse work underway on what the editors call ‘unconventional’ language data, which encompass dialect material and child language among other types. The focus is largely on Europe and the USA, but extends far beyond corpora of English. 9. Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. A classic text in corpus linguistics, which gives the reader a good understanding of not just how to do corpus linguistics but also why. Technology has of course moved on since this book was written, and corpora look quite different now, but the basic principles have changed little. Also recommended are John Sinclair's other works, including Trust the Text: Language, Corpus and Discourse (Routledge, 2004) and, for developing interpretative skills, Reading Concordances (Longman, 2003). 10. McEnery, Tony. 2005. Swearing in English: Bad Language, Purity and Power from 1586 to the Present. London and New York: Routledge. Naturally the subject matter here is relatively narrow, but this book is a classic demonstration of how corpus methodology can contribute to an in‐depth study of a language phenomenon. Swearing and bad language are closely correlated with social context. The principal data drawn on here by McEnery is the spoken component of the British National Corpus. 11. Anderson, Wendy, and John Corbett. To appear 2009. Exploring English with Online Corpora. Basingstoke: Palgrave Macmillan. This book is a basic introduction to the use of online corpora, for students and teachers with little or no previous knowledge. It surveys available online corpora of English, and each chapter contains a series of interactive tasks focusing on levels of language from pronunciation to discourse. Online Materials: 1. Linguist List web resources for texts and corpora http://www.linguistlist.org/sp/Texts.html Linguist List is a widely‐used portal for finding information and resources in all areas of linguistics. The site also runs a worldwide mailing list that is a first port of call for finding out about new publications, current research, jobs, and topics currently being debated. This link is to Linguist List's catalogue of text and corpus resources, including software, which is regularly maintained. 2. British National Corpus (BNC) http://www.natcorp.ox.ac.uk/ This is the homepage of the BNC, and contains detailed information about the corpus, its availability, and features a simple search facility that allows you to retrieve up to 50 random hits of a search term in the entire corpus or in user‐specified sub‐corpora. More complex queries, integrating part of speech information, are also possible. 3. Mark Davies’ interface to the British National Corpus http://corpus.byu.edu/bnc/ This interface, run by Professor Mark Davies at Brigham Young University, Utah, provides a very attractive and flexible way of analysing the BNC, including comparing registers, searching by part of speech, and analysing collocates. Also available from http://corpus.byu.edu are the Corpus of Contemporary American English and the TIME corpus, which use the same interface as the BNC, as well as several corpora of other languages. 4. International Corpus of English (ICE) http://ucl.ac.uk/english‐usage/ice/ The International Corpus of English (ICE) is made up of a set of 1 million‐word corpora of national or regional varieties of English that follow a common design and are therefore readily comparable. Some of the component corpora are available for download from this site; others may be obtained on CD‐Rom; a further number are in the process of creation. Some sample sound files are also available here. 5. The Newcastle Electronic Corpus of Tyneside English (NECTE) http://www.ncl.ac.uk/necte/ The Newcastle Electronic Corpus of Tyneside English is a TEI‐conformant corpus of speech spanning 30 years from the North East of England. This webpage describes the corpus and gives details of its availability. It cannot be searched online, but can be obtained free of charge by researchers. 6. Scottish Corpus of Texts & Speech (SCOTS) http://www.scottishcorpus.ac.uk SCOTS is a corpus of texts in Scottish English and varieties of Scots. Twenty percent of the 4 million‐word corpus is made up of spoken language, and is presented as audio or audio‐video files with searchable orthographic transcriptions. Full corpus texts are available and may be analysed with integrated search tools or downloaded. Detailed textual and demographic metadata accompany each text. 7. The IViE Corpus, English Intonation in the British Isles http://www.phon.ox.ac.uk/IViE/ IViE is an excellent resource for phonetic and phonological analysis, and was created for the Intonational Variation in English project at the University of Oxford. The data represent different levels of spontaneity – read speech data, semi‐spontaneous speech data and interactive speech (map task) – with informants from nine urban areas in the British Isles, including Dublin, Belfast, London and Newcastle. 8. Freiburg English Dialect (FRED) Corpus http://www2.anglistik.uni‐freiburg.de/institut/lskortmann/FRED/index.htm The FRED homepage contains detailed documentation on the corpus and its availability, as well as several sample texts and audio files that can be downloaded. The complete corpus samples nine dialect areas in the UK, and totals 300 hours of speech. 9. Text Encoding Initiative (TEI) http://www.tei‐c.org The TEI consortium has developed a widely‐used standard for representing texts in digital form. The TEI website includes detailed mark‐up guidelines, as well as resources for learning how to implement them. The site also maintains a list of links to projects that have used TEI mark‐up: many of these projects are good sources of texts for building corpora to use in linguistic research and teaching. 10. Text Analysis Portal for Research (TAPoR) http://taporware.mcmaster.ca/ The TAPoR text analysis tools, developed by Geoffrey Rockwell, are a suite of programmes that can be used over the web on texts or small corpora specified by the user, in plain text, XML or HTML format. The programmes are varied, and include tools for creating word lists, concordancing, identifying collocations, examining the distribution of words through texts, among many others. Sample Unit: Corpora and Variation: introduction: This practical course provides an introduction to corpus methodology for students familiar with the concepts and methods of studying variation, perhaps as a prelude to creating their own small corpus as data for a larger sociolinguistic study or dissertation. Most sessions are best suited to a computer classroom, in which students have access to a number of appropriate language corpora. The course can be easily modified to use corpora available to the group, for example, using only free online resources, or networked corpora available through an institutional licence. Students carry out each week's practical work in pairs or small groups: each pair or group then develops one study into a short presentation to be delivered in the final weeks of the course. syllabus: Week 1: Introduction to corpora This session concentrates on the nature of data in studying linguistic variation, considering the advantages and disadvantages of corpora over introspection, elicited data and sociolinguistic interviews. Suggested reading: Hunston, Susan. 2006. Corpus linguistics. In Brown, Keith et al. (eds). Encyclopedia of language and linguistics. Second edition. Volume 3. Amsterdam: Elsevier. 234–248. McEnery, Tony and Andrew Wilson. 2001. Corpus linguistics. 2nd edition. Edinburgh: Edinburgh University Press. Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. Week 2: Corpora and linguistic variation This week's discussion will consider the suitability of corpora for the study of linguistic variation, with a particular focus on demographic metadata and the notion of corpus representativeness. Suggested reading: Anderson, Wendy. 2008. Corpus linguistics in the UK: resources for sociolinguistic research. Language and Linguistics Compass 2/2: 352–371. Bauer, Laurie. 2002. Inferring Variation and Change from Public Corpora. In J. K. Chambers, Peter Trudgill and Natalie Schilling‐Estes (eds.). The handbook of language variation and change. Oxford: Blackwell. Chambers: 97–114. Sigley, Robert. 2006. Corpora in studies of variation. In Keith Brown et al. (eds). Encyclopedia of language and linguistics. Second edition. Volume 3. Amsterdam: Elsevier. 220–226. Week 3: A general corpus of UK English – British National Corpus Individually or in pairs, students familiarize themselves with the British National Corpus and its search interface at http://corpus.byu.edu/bnc/. Example questions: •  Does the collocational patterning of the word window vary according to textual register? •  What words follow red in journalism? And in tabloid journalism specifically? Suggested reading: Anderwald, Lieselotte. 2001. Was/were‐variation in non‐standard British English today. English World‐wide 22(1): 1–21. British National Corpus, description of corpus at http://www.natcorp.ox.ac.uk/ Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. Week 4: A general corpus of US English – Corpus of Contemporary American English This session looks at language use as represented in the Corpus of Contemporary American English (http://www.americancorpus.org/), in comparison with the British National Corpus. Example questions: •  Investigate some of the well‐known lexical differences between US and UK English. •  Compare UK and US usage of singular and plural verbs after collective nouns (e.g. The staff is/are; the team is/are) Suggested reading: Corpus of Contemporary American English, description of corpus, introduction and brief tour at http://www.americancorpus.org/ Kirkpatrick, Andy. 2007. World Englishes: Implications for International Communication and English Language Teaching. Cambridge: Cambridge University Press. Week 5: Studying pronunciation with the IViE corpus Using the IViE corpus (http://www.phon.ox.ac.uk/IViE/), investigate in what positions /r/ is pronounced (rhoticity) across a range of urban varieties of English in the British Isles. Suggested reading: Ladefoged, Peter & Ian Maddieson. 1996. The sounds of the world's languages. Oxford: Blackwell. Week 6: Language and age – the SCOTS corpus Using the advanced search facility of the SCOTS corpus to search the spoken component of the corpus, investigate ways in which language is influenced by the age of the speaker (http://www.scottishcorpus.ac.uk). Example questions: •  What intensifiers (e.g. very, utterly, absolutely and definitely) are typical of the speech of older and younger speakers respectively? •  Compare the use of the word‐form like by speakers of different age groups. Suggested reading: Anderson, Wendy. 2006. ‘Absolutely, totally, filled to the brim with the Famous Grouse’: Intensifying adverbs in SCOTS. English Today 22(3): 10–16. Ito, Rika and Sali Tagliamonte. 2003. Well weird, right dodgy, very strange, really cool: Layering and recycling in English intensifiers. Language in Society 32: 257–279. Levey, Stephen. 2003. He's like ‘Do it now!’ and I'm like ‘No!’, some innovative quotative usage among young people in London. English Today 19(1): 24–32. Macaulay, Ronald. 2002. Extremely interesting, very interesting, or only quite interesting? Adverbs and social class. Journal of Sociolinguistics 6(3): 398–417. Week 7: Variation within a domain – the MICASE corpus Use the full texts available in the Michigan Corpus of Academic Spoken English (MICASE, at http://quod.lib.umich.edu/m/micase/) to investigate how students and tutors asks questions of each other in different sub‐genres of academic discourse, such as lectures, seminars and office‐hour meetings. Suggested viewing: MICASE demos on You Tube: http://www.youtube.com/watch?v=dQEsX8p0wtY Week 8: Variation over time – the TIME corpus Using the TIME corpus, available from http://corpus.byu.edu/time, investigate some words that modify their meaning over time. You will find various suggestions of interesting words in the information accompanying the corpus, created by Mark Davies. Suggested reading: Leech, Geoffrey and Nicholas Smith. 2005. Extending the possibilities of corpus‐based research on English in the twentieth century: a prequel to LOB and FLOB. ICAME Journal 29: 83–98. Week 9: Group presentations of chosen sociolinguistic investigation – I In groups, students should choose one of the corpora used in Weeks 3–8, carry out a small study which uses this corpus as data, and present their findings to the rest of the class. Week 10: Group presentations of chosen sociolinguistic investigation – II As Week 9 above. Focus Questions: 1 What practical and theoretical challenges do corpus compilers face, and why? 2 How important is metadata to a corpus? 3 What sorts of linguistic information can be obtained from large text corpora that cannot easily be obtained from sociolinguistic interviews? 4 What issues arise in using corpora to investigate the relationship between language and society? 5 For what sorts of research is a grammatically tagged corpus useful, and why is there no single system for marking up corpora? 6 For either a written text or passage of spoken language, what are the contextual factors that have an influence on the language used? 7 What genres of text would you expect to find in a general corpus of a national standard variety of English? Seminar/Project Idea: Group Project: Language and society in corpora Identify a feature of language that seems to you to be particular to a variety of English with which you are familiar, or which you believe is typical of a type of speaker (e.g. age group, gender or profession). Depending on the corpora available, you might think about features of lexis, phraseology, grammar or pronunciation. Investigate the feature you have identified using an available corpus. Make sure to consider how appropriate your chosen corpus is for the study, and therefore how generalizable the findings of your analysis are. Is the feature used throughout the variety, or does its use appear to be affected by other factors, such as the level of formality, textual register, or speaker‐related variables?</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Royaume-Uni</li>
</country>
<region>
<li>Écosse</li>
</region>
<settlement>
<li>Glasgow</li>
</settlement>
<orgName>
<li>Université de Glasgow</li>
</orgName>
</list>
<tree>
<country name="Royaume-Uni">
<region name="Écosse">
<name sortKey="Anderson, Wendy" sort="Anderson, Wendy" uniqKey="Anderson W" first="Wendy" last="Anderson">Wendy Anderson</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000097 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000097 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:1570F6442817FA2766D019A0CACE13E47E28832A
   |texte=   Teaching & Learning Guide for: Corpus Linguistics in the UK: Resources for Sociolinguistic Research
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024